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Abstract — Distributed storage systems often introduce 
redundancy to increase reliability. When coding is used, 
the repair problem arises: if a node storing encoded 
information fails, in order to maintain the same level of 
reliabiUty we need to create encoded information at a new 
node. This amounts to a partial recovery of the code, 
whereas conventional erasure coding focuses on the com- 
plete recovery of the information from a subset of encoded 
packets. The consideration of the repair network traffic 
gives rise to new design challenges. Recently, network 
coding techniques have been instrumental in addressing 
these challenges, establishing that maintenance bandwidth 
can be reduced by orders of magnitude compared to 
standard erasure codes. This paper provides an overview 
of the research results on this topic. 

Index Terms — Distributed storage, erasure coding, net- 
work coding, interference alignment, multicast. 

I. Introduction 

In recent years, the demand for large scale data storage 
has increased significantly, with applications like social 
networks, file, and video sharing demanding seamless 
storage, access and security for massive amounts of data. 
When the deployed storage nodes are individually unre- 
liable, as is the case in modern data centers and peer-to- 
peer networks, redundancy must be introduced into the 
system to improve reliability against node failures. The 
simplest and most commonly used form of redundancy 
is straightforward replication of the data in multiple 
storage nodes. However, erasure coding techniques can 
potentially achieve orders of magnitude more reliability 
for the same redundancy compared to replication (see 
e.g. f2\). To realize the increased reliability of coding 
however, one has to address the challenge of maintaining 
an erasure encoded representation. 
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Fig. 1. A (4,2) MDS binary erasure code (Evenodd Code 1101 ). 
Each storage node (box) is storing two blocks that are linear binary 
combinations of the original data blocks Ai,A2,Bi,B2. In this 
example the total stored size is = 4 blocks. Observe that any 
k = 2 out of the n = 4 storage nodes, contain enough information 
to recover all the data. 



Given two positive integers k and n > A;, an (n, k) 
maximum distance separable (MDS) code can be used 
for reliability: initially the data to be stored is separated 
into k information packets. Subsequently, using the MDS 
code, these are encoded into n packets (of the same size) 
such that any k out of these n suffice to recover the 
original data (see Figure \T\ for an example). 

MDS codes are optimal in terms of the redundancy- 
reliability tradeoff because k packets contain the min- 
imum amount of information required to recover the 
original data. In a distributed storage system the n 
encoded packets are stored at different storage nodes 
(e.g., disks, servers or peers) spread over a network, 
and the system can tolerate any (n — k) node failures 
without data loss. Note that throughout this paper we 
will assume a storage system of n storage nodes that 
can tolerate (n — k) node failures and use the idea of 
sub-packetization: each storage node can store multiple 
sub-packets that will be referred to as blocks (essentially 
using the idea of array codes ifTOl . lITTI ). 

The benefits of coding for storage are well known 
and there has been a substantial amount of work in 
the area. Reed-Solomon codes |6| are perhaps the most 
popular MDS codes and together with the very similar 
information dispersal algorithm (IDA) Q, have been 
investigated in distributed storage applications (e.g. [3], 
[5]). Fountain codes [8] and LDPC codes |9| are recent 
code designs that offer approximate MDS properties and 
fast encoding-and-decoding complexity. Finally there has 



been a large body of related work on codes for RAID 
systems and magnetic recording (e.g. see |[T0l - |[T3l and 
references therein). 

In this tutorial we focus on a new problem that arises 
when storage nodes are distributed and connected in a 
network. The issue of repairing a code arises when a 
storage node of the system fails. The problem is best 
illustrated through the example of Figure |2l Assume a 
file of total size M. = A blocks is stored using the 
(4, 2) Evenodd code of the previous example and the 
first node fails. A new node (to be called the newcomer) 
needs to construct and store two new blocks so that 
the three existing nodes combined with the newcomer 
still form a (4, 2) MDS code. We call this the repair 
problem and focus on the required repair bandwidth. 
Clearly, repairing a single failure is easier than recon- 
structing all the data: since by assumption any two 
nodes contain enough information to recover all the data, 
the newcomer could download 4 blocks (from any two 
surviving nodes), reconstruct all four blocks and store 
Ai,A2. However, as the example shows, it is possible 
to repair the failure by communicating only three blocks 
B2, A2 + B2, Ai + A2 + B2 which can be used to solve 
for Ai,A2. 

Figure |3] shows the repair of the fourth storage node. 
This can be achieved by using only three blocks fT4'| 
but one key difference is that the second node needs 
to compute a linear combination of the stored packets 
Bi,B2 and the actual communicated block is Bi + B2. 
This shows clearly the necessity of network coding, cre- 
ating linear combinations in intermediate nodes during 
the repair process. If the network bandwidth is more 
critical resource compared to disk access, as is often 
the case, an important consideration is to find what is 
the minimum required bandwidth and which codes can 
achieve it. 

The repair problem and the corresponding regenerat- 
ing codes were introduced in [24] and received some 
attention in the recent literature [25|-||27l, |[311-i38l. 
Somehow surprisingly these new code constructions can 
achieve a rather significant reduction in repair network 
bandwidth, compared with the straightforward applica- 
tion of Reed-Solomon or other existing codes. In this 
paper we provide an overview of this recent work and 
discuss several related research problems that remain 
open. 

A. Various Repair Models 

In the repair examples shown in Figures |2] and |3l the 
newcomer constructs exactly the two blocks that were in 
failed nodes. Note however that our definition of repair 
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Fig. 2. Example of an (exact) repair: Assume tliat tlie first node 
in tiie previous storage system failed. The question is to repair the 
failure by creating a new node (the newcomer) that still forms a (4,2) 
MDS code. In this example it is possible to obtain exact repair by 
communicating 3 blocks, which is the information theoretic minimum 
cut-set bound. 




Fig. 3. Repairing the last node: in some cases it is necessary 
for storage nodes to compute functions of their stored data before 
communicating, as shown in the second node. 



only requires that the new node forms an (n, k) MDS 
code property (that any k nodes out of n suffice to 
recover the original whole data), when combined with 
existing nodes. In other words, the new node could 
be forming new linear combinations that were different 
from the ones in the lost node; a requirement that is 
strictly easier to satisfy. 

Three versions of repair have been considered in the 
literature: exact repair, functional repair, and exact re- 
pair of systematic parts. In exact repair, the failed blocks 
are exactly regenerated, thus restoring exactly the lost 
encoded blocks with their exact replicas. In functional 
repair, the requirement is relaxed: the newly generated 
blocks can contain different data from that of the failed 
node as long as the repaired system maintains the MDS- 
code property. The exact repair of the systematic part 
is a hybrid repair model lying between exact repair and 
functional repair. In this hybrid model, the storage code 
is always a systematic code (meaning that one copy of 
the data exists in uncoded form). The systematic part 
is exactly repaired upon failures and the non-systematic 
part follows a functional repair model where the repaired 
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Exact repair: 

interference alignment, network coding 




Exact repair of systematic part: 
interference alignment, network coding 



Fig. 4. Various repair models and tlie key constructive tecliniques. 

version may be different from the original copy. See 
Figure |4] for an illustration. Notice that we do not know 
if the repair bandwidth for the three cases can be made 
equal or not (so the subsets are not necessarily strict). 

There is one important benefit in keeping the code 
in systematic form: as shown in Figure [T] if the code 
contains the original data as a subset, reading parts of the 
data can be performed very quickly by just accessing the 
corresponding storage node without requiring decoding. 
Interestingly, as we will see, exact repair which is the 
most interesting problem in practice, is also the most 
challenging one and determining a large part of the 
achievable region remains open. 

The functional repair problem is completely under- 
stood because as shown in [|24l1 . it can be reduced to 
a multicasting problem on an appropriately constructed 
graph called the information flow graph. The pioneering 
work of Ahlswede et al. |[T5l characterized the multicas- 
ting rates by showing that cut-set bounds are achievable. 
Further work showed that linear network coding suf- 
fices |[T6l . ifTSl and random linear combinations construct 
good network codes with high probability ifTOl . See also 
the survey f2V\ and references therein. Since functional 
repair is reduced to multicasting, we can completely 
characterize the minimum repair bandwidth by evalu- 
ating the min-cut bounds and network coding provides 
effective and constructive solutions. In Section |ll] we 
present the results that characterize the achievable func- 
tional repair region and show a tradeoff between storage 
and repair bandwidth. 

The exact repair problem is harder than the functional 
repair problem. In exact repair, the new node accesses 
some existing storage nodes and exactly reproduces the 
lost coded blocks. As will be described in the sequel, 
repair codes come with fundamental tradeoffs between 
storage cost and repair bandwidth. The two important 
special cases involve operating points corresponding to 
maximal storage and minimal bandwidth versus minimal 



storage with maximal bandwidth point. Exact repair for 
the minimal bandwidth operating point is described in 
Section III-BI) and describes the recent work of |33l 
which develops optimal exact repair codes for this op- 
erating point without any loss of optimality with respect 
to only functional repair. 

The special case of the operating point that cor- 
responds to minimal storage, which also corresponds 
to minimizing the repair bandwidth while keeping the 
same storage cost of MDS codes turns out to be more 
challenging. It turns out that in this case, the new node 
needs to recover part of the data which is interfered 
with by the other data. It is the need to carefully 
handle interference that makes the problem difficult. The 
constructive techniques perform algebraic alignment so 
that the effective dimension of unwanted information 
is reduced, thus reducing the repair traffic. These con- 
structive techniques building on the known alignment 
concept characterize the repair bandwidth for low-rate 
codes {k/n < 1/2) and constitute achievable schemes 
for all the range of parameters. It remains however open 
if the cut-set bounds are achievable for the whole range 
of parameters. 

The exact repair of systematic parts model is a relax- 
ation of the exact repair model. As in the exact repair 
model, the core constructive techniques are interference 
alignment and network coding. In Section |IVl we shall 
see that this relaxation addresses some problem space 
not covered by exact repair. 

II. Model I: Functional Repair 

As shown in [24], the functional repair problem can 
be represented as multicasting over an information flow 
graph. The information flow graph represents the evo- 
lution of information flow as nodes join and leave the 
storage network (see also |[23l for a similar construction). 
Figure [5] gives an example information flow graph. In 
this graph, each storage node is represented by a pair 
of nodes, x-„ and x^^^, connected by an edge whose 
capacity is the storage capacity of the node. There is 
a virtual source node s corresponding to the origin of 
the data object. Suppose initially we store a file of size 
A4 = 4 blocks at four nodes, where each node stores 
a = 2 blocks and the file can be reconstructed from any 
2 nodes. Virtual sink nodes called data collectors connect 
to any k node subsets and ensure that the code has the 
MDS property (that any k out of n suffices to recover). 
Suppose storage node 4 fails, the goal is to create a new 
storage node, node 5, which communicates the minimum 
amount of information and then stores a = 2 blocks. 
This is represented in Figure [5] by the unit-capacity edges 
xLt^L' xLt^L' and x^^^xf^ that enter node xf„. 
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Fig. 5. Illustration of the information flow graph Q corresponding 
to the (4,2) code of Figure 1. A distributed storage scheme uses 
an (4, 2) erasure code in which any 2 nodes suffice to recover the 
original data. If node x"^ becomes unavailable and a new node joins 
the system, we need to construct new encoded blocks in x'' . To do so, 
node a;f„ is connected to the d — S active storage nodes. Assuming 
/3 bits communicated from each active storage node, of interest is 
the minimum /? required. The min-cut separating the source and the 
data collector must be larger than M = 4 blocks for regeneration to 
be possible. For this graph, the min-cut value is given by a + 2/3, 
implying that communicating /3 > 1 block is sufficient and necessary. 
The total repair bandwidth to repair one failure is therefore 7 = ci/3 = 
3 blocks. 



The functional repair problem for distributed storage 
can be interpreted as a multicast communication prob- 
lem defined over the information flow graph, where 
the source s wants to multicast the file to the set 
of all possible data collectors. For multicasting, it is 
known that the maximum multicast rate is equal to 
the minimum-cut capacity separating the source from 
a receiver and it can be achieved using linear network 
coding |[T6l . Since the current problem can be viewed 
as a multicast problem, the fundamental limit can be 
characterized by the min-cuts in the information flow 
graph and network coding provides effective constructive 
solutions. One complication is that since the number of 
failures/repairs is unbounded, the resulting information 
flow graph can grow unbounded in size. Hence we have 
to deal with cuts, flows, and network codes in graphs 
that are potentially infinite. 

In Section III-AI we present the cut analysis of in- 
formation flow graphs Il24l . ||25l . In Section III-BI we 
discuss two extreme points corresponding to minimum 
repair bandwidth and minimum storage cost, respectively 
(arguably interesting cases). 

A. Cut Analysis of Information Flow Graphs 

By analyzing the connectivity in the information flow 
graph, we can derive fundamental performance bounds 
about codes. In particular, if the minimum cut between s 
and a data collector is less than the size of original file, 
then we can conclude that it is impossible for the data 
collector to reconstruct the original file. In this section 



we review the cut analysis of 11241 . 11251 . The setup is 
as follows: there are always n active storage nodes. 
Each node can store a bits. An information flow graph 
(as illustrated by Figure [5]l corresponds to a particular 
evolution of the storage system after a certain number 
of failures/repairs. We call each failure/repair a "stage"; 
in each stage, a single storage node fails and the code 
gets repaired by downloading (5 bits each from any d 
surviving nodes. Therefore the total repair bandwidth is 

See Figure [5] for an example. In the initial stage, 
the system consists of nodes 1,2,3,4; in the second 
stage, the system consists of nodes 2, 3, 4, 5. For each 
set of parameters (n,(i, a,7 = dji), there is a family 
of finite or infinite information flow graphs, each of 
which corresponds to a particular evolution of node 
failures/repairs. We denote this family of directed acyclic 
graphs by d, q, 7). We restrict our attention to the 
symmetric setup where it is required that any k storage 
nodes can recover the original file, and a newcomer 
receives the same amount of information from each 
of the existing nodes. An (n. A;, d, a, 7) tuple will be 
feasible, if a code with storage a and repair bandwidth 7 
exists. For the example in Figure |2j the total file has size 
M. = A blocks and the point (n = 4, A: = 2, d = 3, a = 
2 blocks, 7 = 3 blocks) is feasible. On the contrary, a 
standard erasure code which communicates the whole 
data object would correspond to 7 = 4 blocks instead. 
Note that n, k, d must be integers. If there is one failure, 
the newcomer can connect to at most to all the n — 1 
surviving nodes, so d < n — 1 and a, /3, 7 = d/3 are 
the non-negative real valued parameters of the repair 
process. 

Theorem 1: For any a > a*(n, /c, d, 7), the points 
{n,k,d,a,'^) are feasible and linear network codes 
suffice to achieve them. It is information theoretically 
impossible to achieve points with a < a*{n,k,d,^). 
The threshold function a*(n. A;, d, 7) is the following: 
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where d < n — 1. Given {n,k,d), the minimum repair 
bandwidth 7 is 
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Fig. 6. Optimal tradeoff curve between storage a and repair bandwidth 7, for k 
traditional erasure coding corresponds to the point (7 = 1,0 = 0.2). 



5, n = 10. Here M = 1 and d 



1. Note that 



One important observation is that the minimum repair 
bandwidth 7 = d/? is a decreasing function of the 
number d of nodes that participate in the repair. While 
the newcomer communicates with more nodes, the size 
of each communicated packet /3 becomes smaller fast 
enough to make the product d/3 decrease. Therefore, 
the minimum repair bandwidth can be achieved when 
d = n — 1. 

As we mentioned, code repair can be achieved if and 
only if the underlying information flow graph has suffi- 
ciently large min-cuts. This condition leads to the repair 
rates computed in Theorem [H and when these conditions 
are met, simple random linear combinations will suffice 
with high probability as the field size over which coding 
is performed grows, as shown by Ho. et al. |[T9l . The 
optimal tradeoff curve for k = 5,n = 10, d = 9 is 
shown in Figure [6l 



B. Two Special Cases 

It is of interest to study the two extremal points 
on the optimal tradeoff curve, which correspond to 
the best storage efficiency and the minimum repair 
bandwidth, respectively. We call codes that attain these 
points minimum-storage regenerating (MSR) codes and 



minimum-bandwidth regenerating (MBR) codes, respec- 
tively. 

From Theorem [TJ it can be verified that the minimum 
storage point is achieved: 



{aMSR,lMSR) = [ -j^, 
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As discussed, the repair bandwidth 'Jmsr = dj^MSR 
is a decreasing function of the number of nodes d that 
participate in the repair. Since the MSR codes store ^ 
bits at each node while ensuring the MDS-code property, 
they are equivalent to standard MDS codes. Observe that 
when d = k, the total communication for repair is A4 
(the size of the original file). Therefore, if a newcomer 
is allowed to contact only k nodes, it is inevitable to 
download the whole data object to repair one new failure 
and this is the naive repair method that can be performed 
for any MDS codes. 

However, allowing a newcomer to contact more than 
k nodes, MSR codes can reduce the repair bandwidth 
IMSR^ which is minimized when d = n — 1: 

'M M 
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We have separated the M./k factor in J^^gji to illustrate 
that MSR codes communicate an factor more than 
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what they store. This represents a fundamental expansion 
necessary for MDS constructions that are optimal on the 
reliability-redundancy tradeoff. For example, consider a 
(n, A:) = (14, 7) code. In this case, the newcomer needs 
to download only ^ bits from each of the d = n — 1 = 
13 active storage nodes, making the repair bandwidth 
equal to =y- • ^. Notice that we need only an expansion 
factor of while a factor of 7 is required for the native 
repair method. 

At the other end of the tradeoff are MBR codes, which 
have minimum repair bandwidth. It can be verified that 
the minimum repair bandwidth point is achieved by 
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Note that the minimum bandwidth regenerating codes, 
the storage size a is equal to 7, the total number of bits 
communicated during repair. If we set the optimal value 
d = n — 1, we obtain 

'M 2n-2 M 2n-2 



( rain -.min \ 

K'^mbr^imbr) 



2n-k-l' k 2n 



k-1 
(8) 



Notice that a^/]^j^ 



'yTiBR- MBR codes incur no repair 
bandwidth expansion at all, just like a replication system 
does, downloading exactly the amount of information 
stored during a repair. However, MBR codes require 
an expansion factor of 2^-^^ 1 amount of stored 

information and are no longer optimal in terms of their 
reliability for the given redundancy. 

III. Model II: Exact Repair 

As we discussed, the repair-storage tradeoff for func- 
tional repair can be completely characterized by analyz- 
ing the cut-set of the information flow graphs. However, 
as mentioned earlier, functional repair is of limited prac- 
tical interest since there is a need to maintain the code 
in systematic form. Also, under functional repair, signif- 
icant system overhead is incurred in order to continually 
update repairing-and-decoding rules whenever a failure 
occurs. Moreover, the random network coding based 
solution for the function repair can require a huge finite- 
field size to support a dynamically expanding graph size 
(due to continual repair). This can significantly increase 
the computational complexity of encoding-and-decoding. 
Furthermore, functional repair is undesirable in storage 
security applications in the face of eavesdroppers. In 
this case, information leakage occurs continually due to 
the dynamics of repairing-and-decoding rules that can 
be potentially observed by eavesdroppers ||40|. These 
drawbacks motivate the need for exact repair of failed 



nodes. This leads to the following question: is it possible 
to achieve the cut-set lower bound region presented, with 
the extra constraint of exact repair? 

Recently, significant progress has been made on the 
two extreme points of the family of Regenerating Codes 
(and arguably most interesting): the MBR point |[33l and 
the MSR point tlU, tMJ, li35J. The authors in ^ 
showed that for c? = n — 1 (the interesting case), the 
optimal MBR point can be achieved with a deterministic 
scheme requiring a small finite-field size and repair 
bandwidth matching the cut-set bound of ([8]l. 

For the MSR point, lISTl showed that it can be attained 
for the cases of k = 2 and k = n — 1 when d = n — 1. 

Subsequently, the authors in [34 | established that for - > 

1 9 ^ 
^ + ^, cut-set bounds cannot be achieved for exact repair 

under scalar linear codes (i.e., /3 = 1) where symbols are 

not allowed to be split into arbitrarily small sub-symbols 

as with vector linear code^ For large n, this case boils 

down to - > i. For - < i, whether or not exact repair 

comes with a non-zero gap from cut-set bounds remained 

an open problem. 

Recently, the authors in f35l showed that Exact-MSR 

codes can match the cut-set bound of ^ for the case 

of ^ < ^ and d > 2A; — lH For the in-between regime 

- G {\, I + [32] and |35| showed that cut-set bounds 

are achievable for the case of = 3. For the most 

general Exact-MSR case, finding the fundamental limits 

in storage and repair bandwidth for all values of {n,k,d) 

remains a challenging open problem. We now briefly 

summarize some of these recent results. 



A. Exact-MBR Codes 

Theorem 2 (Exact-MBR Codes (H): For d = n - 

1, the cutset lower bound of ([8]l can be achieved with a 
deterministic scheme that requires a finite-field alphabet 
size of at most ("~^)" _ 

Figure |7] illustrates an idea through the example of 
(n, k, d, a, 7) = (5, 3, 4, 4, 4) where the maximum file 
size of 7W = 9 (matching the cutset bound) can be 
stored. Let a be 9-dimensional data file. Each node 
stores 4 blocks with the form of a*Vj, where Vj can be 
interpreted as a one-dimensional subspace of data file. 
We simply write only subspace vector to represent an 

'This is equivalent to having large block-lengths in the classical 
setting. Under non-linear and vector linear codes, tightness of cut-set 
bounds remains open. 

^The idea was inspired by the code structure in 1341 where exact 
repair is guaranteed for the systematic part only. Indeed, it is shown 
in 1351 that the code introduced in 1341 for exact repair of only 
the systematic nodes can also be used to repair the non-systematic 
(parity) node failures exactly provided repair construction schemes 
are appropriately designed. 
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Fig. 7. Repairing node 1 for a (5, 3)-MBR code. Note thiat the 
number of desired blocks (that need to be repaired) is equal to the 
number of available equations (that can be downloaded). Hence, the 
code should be designed such that undesired blocks (interference) are 
totally avoided. 



actually stored block. Notice that the degree d is equal 
to the number of storage blocks to be repaired, i.e., the 
number of available equations matches the number of 
desired variables for exact repair of a single node. Hence, 
for exact repair, there must be at least one duplicated 
block between node 1 and node i for alH 7^ 1. 

This observation motivates the following idea. The 
idea is to have other nodes i {i ^ I) store each block of 
node 1, respectively: node 2, 3, 4, and 5 store a*vi, a*V2, 
a*V3, and a*V4 in its own place, respectively. Notice that 
for ensuring repair, it suffices to have only one duplicated 
block between any two storage nodes. Hence, node 2 
can store another new 3 blocks of a*V5, a^vg and a^vy 
in the remaining other places. In accordance with the 
above procedure, node 3, 4, and 5 then copy each of 
three blocks in their space, respectively. We repeat this 
procedure until 10 (=4 + 3 + 2 + 1) blocks are stored 
in total. One can see that this construction guarantees 
exact repair of any failed node, since at least one block 
is duplicated between any two storage nodes and also the 
duplicated block is distinct. See the example in Figure |2l 

The remaining issue is now to design these 10 sub- 
space vectors Vj, i = 1, • • • , 10. The detailed construc- 
tion comes from the MDS-code property that any three 
nodes out of five need to recover the whole data file. 
Observe in Figure |7] that nine distinct vectors can be 
downloaded from any three nodes. Hence, any (10,9) 
MDS code can construct these Vj's. In this example, 
using the parity-check-code defined over GF(2), we can 
design the Vj's as follows: Vj = e^, Vz = 1, ■ ■ ■ , 9 and 
vio = [1, • ■ ■ 7 !]*■ It has been shown in |[33 ^ that this 
idea can be extended to an arbitrary (n, k) case. 

This construction can be interpreted as an optimal 
interference avoidance technique. To see this, observe 
in the figure that the number of desired blocks for 
exact repair matches the number of available equations 
that can be downloaded. Hence, the involvement of any 
undesired blocks (interference) precludes exact repair. A 
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Fig. 8. Repairing a (4, 2)-MSR code, when node 1 fails f311. 



natural question arises: can this interference-avoidance 
technique provide solutions to the other extreme MSR 
point? It turns out that a new idea is needed to cover 
this point. 

B. Exact-MSR Codes 

The new idea is interference alignment ||28]| . |[29l . The 
idea of interference alignment is to align multiple inter- 
ference signals in a signal subspace whose dimension 
is smaller than the number of interferers. Specifically, 
consider the following setup where a decoder has to 
decode one desired signal which is linearly interfered 
with by two separate undesired signals. How many linear 
equations (relating to the number of channel uses) does 
the decoder need to recover its desired input signal? 
As the aggregate signal dimension spanned by desired 
and undesired signals is at most three, the decoder can 
naively recover its signal of interest with access to three 
linearly independent equations in the three unknown 
signals. However, as the decoder is interested in only one 
of the three signals, it can decode its desired unknown 
signal even if it has access to only two equations, pro- 
vided the two undesired signals are judiciously aligned 
in a 1 -dimensional subspace. See |[28l - |[30ll for details. 

This concept relates intimately to our repair problem 
that involves recovery of a subset (related to the subspace 
spanned by a failed node) of the overall aggregate signal 
space (related to the entire user data dimension). This 
attribute was first observed in fST], where it was shown 
that interference alignment could be exploited for Exact- 
MSR codes. 

Figure [8] illustrates interference alignment for exact 
repair of failed node 1 for (n, k, d, a, 7) = (4, 2, 3, 2, 2) 
where the maximum file size of = 4 can be stored. 
We introduce matrix notation for illustration purposes. 
Let a = (01,02)* and b = (61,62)* be 2-dimensional 
information-unit vectors. Let Aj and Bj be 2-by-2 
encoding matrices for parity node i (i = 1,2), which 
contain encoding coefficients for the linear combination 
of (ai, 02) and (5i, b2), respectively. For example, parity 
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node 1 stores blocks in the form of a*Ai + b*Bi, as 
shown in Fig. [8] The encoding matrices for systematic 
nodes are not explicitly defined since those are trivially 
inferred. Finally, we define 2-dimensional projection 
vectors v^j's (i = 1, 2, 3) because of /3 = 1. 

Let us explain the interference alignment scheme. First 
two blocks in each storage node are projected into a 
scalar with projection vectors v^i's. By connecting to 
three nodes, we get: v^^b; (AiVQ2)*a + (BiVa2)*b; 
(A2VQ3)*a + (B2V„3)*b. Here the goal is to decode 
2 desired unknowns out of 3 equations including 4 
unknowns. To achieve this goal, we need: 



rank 



(AiV„2)* 
(A2V„3)* 



2; rank 



(BiV„2)* 
(B2V,3)* 



uode 1 
uode 2 
node 3 
node 4 

:i.arity node ] 

node 5^ 



b' 1 - 



a'A 


+ b'B 










+ b'B2 


+ c 




la'A 


+ b'B: 


+ c 





(AlVi)' 
(A,vi)' 

(A;,Vi)' 





(Biv,)' 
(Bjv,)' 

(B;,V,)' 





(Civi)' 
(C:,vi)' 



AiVi 



-Goal: rank=3 rank=1 rank=1 



^ ^ ^ » 



A3V1 



Vl 



> > > > 



A2V1 

Idea' {^) Design Aj's, B^'s and Cj's s.t. Vi is a common eigenvector of 
the Bj 's and Cj's, but not of the A^'s. 

(ii) Repair by having survivor nodes project their data onto a linear 
subspace spanned by this common eigenvector vi. 

Fig. 9. Repairing the (6, 3)-MSR code when a systematic node fails. 
1-A common eigenvector concept is employed to achieve interference 
alignment simultaneously. 



The second condition can be met by setting Va2 = 
B^^Vai and Vq3 = B^^v^i. This choice forces the 
interference space to be collapsed into a one-dimensional 
linear subspace, thereby achieving interference align- 
ment. On the other hand, we can satisfy the first condi- 
tion as well by carefully choosing the Afs and Bj's. For 
exact repair of node 2, we can apply the same idea. For 
parity node repair, we can remap parity node information 
and then apply the same technique. 

It turned out this idea cannot be generalized to ar- 
bitrary (n, k) case: it provides the optimal codes only 
for the case of k = 2. Recently, significant progress has 
been made: for the case of - < i, it has been shown 
that there is no price with exact repair for attaining the 
cutset lower bound of 

Theorem 3 (Exact-MSR Codes |35|): Suppose the 
MDS code rate is at most i, i.e., ^ < ^ and the degree 
d > 2k— I. Then, the cutset bound of ^ can be achieved 
with interference alignment. The achievable scheme is 
deterministic and requires a finite-field alphabet size of 
at most 2{n — k). 

A more sophisticated idea arises to cover this case: 
simultaneous interference alignment. Figure |9] illustrates 
the interference alignment technique through the exam- 
ple of {n,k,d,a,j) = (6,3,5,3,3) where M = 9. Let 
a = (01,02,03)*, b = (61,62,^3)* and c = (01,02,03)* 
be 3-dimensional information-unit vectors. Let Aj, Bj 
and Cj be 3-by-3 encoding matrices for parity node i 
(i = 1, 2, 3). We define 3-dimensional projection vectors 
Voi's (i = 1, • • • ,5). 

By connecting to five nodes, we get five equations 
shown in the figure. In order to successfully recover 
the desired signal components of a, the matrix asso- 
ciated with a should have full rank of 3, while the 
other matrices corresponding to b and c should have 
rank 1, respectively. In accordance with the (4,2) code 



example in Figure [8j if one were to set = Bj~^Vai, 
Vq4 = B^^Vq2 and Vq,5 = Bg^^v^i, then it is possible 
to achieve interference alignment with respect to b. 
However, this choice also specifies the interference space 
of c. If the Bj's and Cj's are not designed judiciously, 
interference alignment is not guaranteed for c. Hence, it 
is not evident how to achieve interference alignment at 
the same time. 

In order to address the challenge of simultaneous 
interference alignment, a common eigenvector concept 
is invoked. The idea consists of two parts: (i) designing 
the (Aj, Bj, Ci)'s such that vi is a common eigenvector 
of the Bj's and Cj's, but not of Aj'^; (ii) repairing by 
having survivor nodes project their data onto a linear 
subspace spanned by this common eigenvector vi. We 
can then achieve interference alignment for b and c at 
the same time, by setting v^j = vi,Vi. As long as 
[AiVi, A2V1, A3V1] is invertible, we can also guarantee 
the decodability of a. See Figure |9l 

The challenge is now to design encoding matrices to 
guarantee the existence of a common eigenvector while 
also satisfying the decodability of desired signals. The 
difficulty comes from the fact that in the (6, 3, 5) code 
example, these constraints need to be satisfied for all 
six possible failure configurations. The structure of ele- 
mentary matrices (generalized matrices of Householder 
and Gauss matrices) gives insights into this. To see this, 
consider a 3-by-3 elementary matrix A: 



uv* + al. 



(9) 



where u and v are 3-dimensional vectors. Note that the 
dimension of the null space of v is 2 and the null vector 

^^Of course, five additional constraints also need to be satisfied for 
the other five failure configurations for this (6, 3, 5) code example. 
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as 



Cl 



C2 



C3 



61 + 62 + 63 



CI + C2 + C3 



3ai + 'ibi + 'ici 








2oi + 02 + 35i +62 + Cl + C2 






2ai +03 +61 + 63 + 3ci + C3 









ai + 202 + 26i + 262 + 3ci + 2c2 








302 +62 + 2C2 












202 + a;i + 62 + 263 + 3C2 + 3C3 









oi +203 +36i +263 +2ci + 2c3 








02 + 203 + 362 + 363 + 2C2 + C3 








303 + 263 + C3 








(a) Exact repair of systematic 


node 1 



3ai +02 + 03 
+bi + 62 + 63 

+Ci + C2 + C3 

ai + 3a2 + 03 
+26i + 262 + 263 
+3ci + 3c2 + 3c3 

Qi + 02 + 3a3 
+36i + 362 + 363 
+2ci + 2c2 + 2c3 



IP 



2a; ^ 


- qI, + a'i + 36; 


+ 


3c; 


1 




302 + ''i + 262 - 




H 


- 3C2 




3a'3 H 


-36i + 


c', + 


c!> + 2c!, 




2a'i 


f 3o^ + 20;, + 26; 




+ c; 


1 




3a'2 + 6; + 62 - 


1-21)3 




fC2 






+ 26' 


+ c; 


+ 3c; + 2c', 




2oi 


+ 2o^ + 3Q3 + b[ 




+ 2c; 


, 1 




3a'2 + 6; + 362 + 363 




+ 2c;. 




3o!, 


+ 6;, 




+ 2c; + c'. 



(b) Exact repair of parity node 1 



Fig. 10. Illustration of exact repair for a (6, 3, 5) E-MSR code defined over GF(4) where a generator polynomial g{x) = + x + 1. 
The solution for systematic node repair is simple: setting all of the projection vectors as (1, 1, 1)*. This enables simultaneous interference 
alignment, while guaranteeing the decodability of a. For our carefully chosen parameters, parity node repair is much simpler. For the repair, 
we download only the first equation from each survivor node to solve five linear equations containing only five unknowns. 



V is an eigenvector of A, i.e., Av 
motivates the following structure: 

UiV* + Qil 



QV 



This 



Ai 

A2 = U2V* + 02! 
A3 = U3V1 + 03! 



Bi = uiv* + /3il; Cl 
B2 = U2V^ + /32I; C2 

B3 = U3V^ + /33I; C3 



U1V3 + 7il 

U2V3 + 72I 

U3V3 + 73I, 
(10) 



where Vj's are 3-dimensionaI linearly independent vec- 
tors and so are Uj's. The values of the Oj's, /3j's and 
7j's can be arbitrary non-zero values. For simplicity, we 
consider the simple case where the Vj's are orthonormal, 
although these need not be orthogonal, but only linearly 
independent. We then see that Vi = 1, 2, 3, 



AjVi 
BiVi 



7jVi. 



(11) 



Importantly, notice that vi is a common eigenvector of 
the Bj's and Cj's, while simultaneously ensuring that the 
vectors of AjVi are linearly independent. Hence, setting 
Voi = vi for all i, it is possible to achieve simultaneous 
interference alignment while also guaranteeing the de- 
codability of the desired signals. On the other hand, this 
structure also guarantees exact repair for b and c. We use 



V2 for exact repair of b. It is a common eigenvector of 
the Cj's and Aj's, while ensuring [B1V2, B2V2, B3V2] 
invertible. Similarly, V3 is used for c. 

Parity nodes can be repaired by drawing a dual 
relationship with systematic nodes. The procedure has 
two steps. The first is to remap parity nodes with a', 
b', and c', respectively. Systematic nodes can then be 
rewritten in terms of the prime notations: 



a* = a'*A; + b'*B; + c'*C;, 



b* = a'*A'2 + b'*B'2 + c'*C^, 
c* = a'*A'3 + b'^B^j + c"C'^, 



(12) 



where the newly mapped encoding matrices 
(A^,B^,Ci)'s are defined as: 



A'l A2 A3 
B'l B'2 B'3 
^1 C2 C3 



Ai A2 A3 
Bi B2 B3 
Cl C2 C3 



(13) 



With this remapping, one can dualize the relationship 
between systematic and parity node repair. Specifically, 
if all of the A^'s, B^'s, and C^'s are elementary matrices 
and form a similar code-structure as in (ITOl ). exact 
repair of the parity nodes becomes transparent. It was 
shown that a special relationship between [ui,U2,U3] 
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Fig. 11. Illustration of the scheme in 1361 . 

and [vi,V2,V3] through the correct choice of (a,, (3i, 
7j)'s can also guarantee the dual structure of dTOl ) |[35l . 

Figure [TO] shows a numerical example for exact repair 
of (a) systematic node 1 and {h) parity node 1 where 
[vi,V2,V3] = [2, 2, 2; 2, 3, 1; 2, 1, 3]. This example il- 
lustrates the code structure that generalizes the code 
introduced in ll34l . See |[35i for details. This generalized 
code structure allows for a much larger design space for 
exact repair. 

Notice that the projection vector solution for system- 
atic node repair is simple: v^j = 2~^vi = (1, 1, 1)*, Vi. 
Note that this choice enables simultaneous interference 
alignment, while guaranteeing the decodability of a. 
Notice that {bi,h2,b^) and (01,02,03) are aligned into 
hi +1)2 + 63 and ci + C2 + C3, respectively, while three 
equations associated with a are linearly independent. 

The dual structure also guarantees exact repair of 
parity nodes. Importantly, we have chosen code param- 
eters from the generalized code structure of ll35l such 
that parity node repair is quite simple. As shown in 
Figure [TO] (6), downloading only the first equation from 
each survivor node ensures exact repair. Notice that the 
five downloaded equations contain only five unknown 
variables of {a'^, a'g, 03, b'^, c'^) and three equations asso- 
ciated with a' are linearly independent. Hence, we can 
successfully recover a'. 

It has been shown in [35] that this alignment technique 
can be easily generalized to arbitrary (n, A;, d) where n > 
2k and d > 2k - 1. 

IV. Model III: Exact Repair of the Systematic 

Part 

In this section, we review the constructive scheme 
given in ||36l . which gives a construction of systematic 
(n, A;)-MDS codes for 2k < n that achieves the minimum 
repair bandwidth when repairing from k + 1 nodes. 

The scheme is illustrated in Figure [TT] Let F denote 
the finite field where the code is defined in. In Figure [TT] 
X G F^'^ is a vector consisting of the 2k original 



information symbols. Each node stores 2 symbols, x'^Ui 
and x'^Vi. The vectors {ui} do not change over time 
but {vi} change as the code repairs. We maintain the 
invariant property that the 2n length-2/i; vectors {ui, Vi} 
form an (2n, 2A;)-MDS code; that is, any 2k vectors in 
the set {ui, Vi} have full rank 2k. This certainly implies 
that the n nodes form an (n, A;)-MDS code. We initialize 
the code using any (2n, 2k) systematic MDS code over 
F. 

Now we consider the situation of a repair. Without loss 
of generality, suppose node n failed and is repaired by 
accessing nodes 1, . . . , A; + 1. As illustrated in Figure [TT] 
the replacement node downloads aix'-^Ui + f^ix'^Vi from 
each node of {1, . . . , A; + 1}. Using these + 1 down- 
loaded symbols, the replacement node computes two 
symbols x'^Un and x'^v'^ as follows: 

fc+i 

{aiX^Ui + l^iX^Vi) = x'^Un (14) 

i=l 
fc+1 

Pi {aix'^Ui + Pix'^Vi) = x^v'^ (15) 

i=l 

Note that v'^ is allowed to be different from the 
property that we maintain is that the repaired code con- 
tinues to be an (2n, 2A;)-MDS code. Here {ai, Pi, pi} and 
are the variables that we can control. The following 
theorem shows that we can choose these variables so 
that ([TTt and ([TSl l are satisfied and the repaired code 
continues to be an (2n, 2k)-MDS code. 

Theorem 4 ( |36|): 
Let F be a finite field whose size is greater than 

Suppose the old code specified by {ui,Vi} is an 
(2n, 2A;)-MDS code defined over F. When node n fails, 
there exists an assignment of the variables {ai,j3i,pi} 
such that ([TTb and ( [TSt are satisfied and the repaired 
code continues to be an (2n, 2A;)-MDS code. 

Corollary 1 (A Systematic (n, A;)-MDS Code): 
The above scheme gives a construction of systematic 
(ra, /c)-MDS codes for 2k < n that achieves the minimum 
repair bandwidth when repairing from k + 1 nodes. 
Proof: Consider n > 2k. Note that in the above scheme, 
we can initialize the code {ui, . . . , Un, vi, . . . , Vn} with 
any (2n, 2/c)-MDS code. In particular, we can use a sys- 
tematic code and assign the 2k systematic code vectors to 
{ui, . . . , U2k}- Since {ui, . . . , Un} do not change over 
time, the code remains a systematic (2n, 2fc)-MDS code. 
Thus the n nodes form a systematic (n, A;)-MDS code. 
The code repairs a failure by downloading k + 1 blocks 
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TABLE I 

Known results for exact MBR and MSR codes. All 
points correspond to regimes where the cut-set bound 

region IS KNOWN TO BE ACHIEVABLE 





MBR 


MSR 


Functional 


(24l: Vn,fc,d 


Hybrid 


? 


tMl: 1 < i > 2fc - 1 
l36l. I33i: < |, d = fc+ 1 


Exact 


l33l:d = n-l 


|35|:^ < d > 2fc - 1 



from d = k+1 nodes, with the total file size is = 2A;, 
achieving the cut-set bounds derived in section |lll ■ 

V. Discussion and Conclusions 

We provided an overview of recent results about the 
problem of reducing repair traffic in distributed storage 
systems based on erasure coding. Three versions of the 
repair problems are considered: exact repair, functional 
repair and exact repair of systematic parts. In the exact 
repair model, the lost content is exactly regenerated; in 
the functional repair model, only the same MDS-code 
property is maintained before and after repairing; in the 
exact repair of systematic parts, the systematic part is 
exactly reconstructed but the non-systematic part follows 
a functional repair model. 

The functional repair problem is in essence a problem 
of multicasting from a source to an unbounded number 
of receivers over an unbounded graph. As we showed 
there is a tradeoff between storage and repair bandwidth 
and the two extremal points are achieved by Minimum 
Bandwidth and Minimum Storage regenerating (MBR 
and MSR) codes. The repair bandwidth is characterized 
by the min-cut bounds and therefore the functional repair 
problem is completely solved. 

Problems that require exact repair correspond to net- 
work coding problems having sinks with overlapping 
subset demands. For such problems cut-set bounds are 
not tight in general and linear codes might not even 
suffice |22|. The recent work we discussed [33] showed 
that for MBR codes the repair bandwidth given by the 
cut-set bound is achievable for the interesting case of 
d = n — 1. The minimum-storage point seems harder 
to understand. The best known constructions [35] we 
presented match the cut-set bound for A;/n < 1/2 for the 
interesting regime of connectivity d € [2k — l,n — 1]. 



A corresponding negative result ||34l established that 
for ^ > ^ + §, the cut-set bound cannot be achieved 
by interference alignment-based linear schemes. Table 
I summarizes what is known for the repair bandwidth 
region and an online editable bibliography (wiki) can 
be found online UJ. All the cases marked correspond 
to regimes where the cut-set bound is known to be 
achievable. To the best of our knowledge there are no 
information theoretic upper bounds other than the cut-set 
bound and it would be very interesting to see if the region 
could be universally achievable. Of particular interest is 
the case of exact Minimum Storage Regenerating codes 
for d = n — 1 and high rates. 

In addition to the complete characterization of the 
repair rate region for storage, there are several other 
interesting open problems. A first problem is to inves- 
tigate the influence of network topology, as initiated 
recently 1(3 8 ll for trees. All the prior work so far has 
been assuming a complete connectivity topology for the 
storage network. However, most networks of interest 
will have different communication capacities and sparse 
topologies. For these cases communication will have a 
different cost and it would be interesting to formulate 
this as an optimization problem. 

Secondly, the issues of security and privacy are impor- 
tant for distributed storage. When coding is used, errors 
can be propagated in several mixed blocks through the 
repair process ||39ll and an error-control mechanism is 
required. A related issue is that of privacy of the data 
by information leakage to eavesdroppers during repairs 

ioi. 

Finally small finite-field constructions require further 
investigation. While many of the constructions presented 
require a large finite-field size, practical storage systems 
would benefit from efficient binary operations. Recently 
Zhang et al. suggested a scheme for repairing Evenodd 
codes 1 14], which are binary codes with n = k+2. While 
the proposed scheme does not match the cut-set bound it 
improves on the naive repairing method of reconstructing 
all the data blocks. Constructing regenerating codes for 
small finite fields or designing repair algorithms for 
existing codes will be of significant practical interest. 
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